---
id: tutorial-introduction
title: Introduction
sidebar_label: Introduction
---

**DeepEval** is a powerful open-source LLM evaluation framework. In these tutorials we'll show you how you can use DeepEval to improve your LLM application one step at a time. These tutorials walk you through the process of evaluating and testing your LLM applications — from initial development to post-production.

Below is a curated set of tutorials — each focused on real-world tasks, metrics, and best practices for reliable LLM evaluation. Start with the basics, or jump straight to your use case.

## Tutorials

import LinkCards from "@site/src/components/LinkCards";

<LinkCards
  tutorials={[
    {
      title: "Start Here: Install & Run Your First Eval",
      description:
        "Not sure where to begin? Click here to get started and run your first evaluation with DeepEval",
      to: "/tutorials/tutorial-setup",
    },
    {
      title: "Meeting Summarizer",
      description:
        "Learn how to develop and evaluate a summarization agent using DeepEval.",
      to: "/tutorials/summarization-agent/introduction",
    },
    {
      title: "RAG QA Agent",
      description:
        "Evaluate your RAG pipeline for accuracy, relevance, and completeness.",
      to: "/tutorials/rag-qa-agent/introduction",
    },
    {
      title: "Medical Chatbot",
      description:
        "Test a healthcare-focused LLM chatbot for hallucinations and safety.",
      to: "/tutorials/medical-chatbot/introduction",
    },
  ]}
/>

## What You'll Learn

DeepEval tutorials cover the best practices for evaluating LLM applications across both development and production.

### Development Evals

You'll learn how to:

- Select evaluation metrics that align with your task
- Use `deepeval` to measure and track LLM performance
- Interpret results to tune prompts, models, and other system hyperparameters
- Scale evaluations to cover diverse inputs and edge cases

### Production Evals

You'll also see how to:

- Continuously evaluate your LLM's performance in production
- Run A/B tests on different models or configurations using real data
- Feed production insights back into your development workflow to improve future releases

:::tip
LLM evaluation isn't a one-time step — it's a continuous loop. Production data sharpens development. Development precision strengthens production. Which is why it's crucial to do both — and DeepEval helps you do just that.
:::

<details>

<summary>
  <strong>
    Here are a few key terminologies to keep in mind for LLM evaluations
  </strong>
</summary>

- **Hyperparameters**: The configuration values that shape your LLM application. This includes system prompts, user prompts, model choice, temperature, chunk size (for RAG), and more.
- **System Prompt**: A prompt that defines the overall behavior of your LLM across all interactions.
- **Generation Model**: The model used to generate responses — this is the LLM you're evaluating. Throughout the tutorials, we'll simply call it the _model_.
- **Evaluation Model**: A separate LLM used to score, critique, or assess the outputs of your generation model. This is **not** the model being evaluated.

</details>

## What DeepEval Offers

DeepEval supports a wide range of LLM evaluation metrics tailored to different use cases, including:

- **RAG applications (Retrieval-Augmented Generation)**
- **Conversational applications**
- **Agentic applications**

[Click here](https://deepeval.com/docs/metrics-introduction) to explore all the metrics `deepeval` offers.

Throughout these tutorials, we'll walk through how to evaluate a variety of use cases with `deepeval` using real-world best practices. Your specific use case may differ — and that's expected.
The evaluation approach remains the same: **define your criteria, choose the right metrics, and iterate based on the results.**

## Who This Is For

Whether you're building chatbots, summarizers, or agent systems powered by LLMs, these tutorials are designed for:

- Developers shipping LLM features in real products
- Researchers testing prompts or model variations
- Teams optimizing LLM outputs at scale

Whether you're just experimenting or managing LLMs in production, these tutorials will help you test reliably, iterate faster, and ship with more confidence.

Want to get started right away? [Click here](#tutorials) to look at the list of available tutorials.
